ggplot2ggplot2 is the most elegant and aesthetically pleasing graphics framework available in R. The way you make plots in ggplot2 is very different from base graphics making the learning curve steep. That said, it’s totally worth it.
#Within each document, it is important to call the ggplot2 package so it knows you will be using functions/data/etc from inside that package
library(ggplot2)
library(tidyverse)
## ── Attaching packages ──────────────────────── tidyverse 1.3.0 ──
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ✔ purrr 0.3.3
## ── Conflicts ─────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(RColorBrewer)
It’s essential that you properly organize your data into a data frame before you start with ggplot2. This why we spend the last week or two focus on learning ways to transform and wrangle data into different formats.
Once you have your data ready to go then you gradually add bits and pieces to it to create a plot. Plots are built up in layers, with the typically ordering being
We will be working with the dataset midwest. It contains information on many different counties in Illinois, Indiana, Michigan, Ohio, and Wisconsin.
data(mpg)
# ggplot ( dataframe, aes(x=xvariable, y=yvariable))
# aes stands for aesthetics
# inital ggplot
ggplot(mpg, aes(x=cty, y=hwy))
A blank ggplot is drawn. Even though the x and y are specified, there are no points or lines in it. This is because, ggplot doesn’t assume that you meant a scatterplot or a line chart to be drawn. I have only told ggplot what dataset to use and what columns should be used for X and Y axis. I haven’t explicitly asked it to draw any points.
The basics:
ggplot(mpg, aes(x=cty, y=hwy)) +
geom_point()
##OR
ggplot(mpg, aes(x=cty, y=hwy)) %>%
+ geom_point()
To customize colors, plotting characters, size:
ggplot(mpg, aes(x=cty, y=hwy)) +
geom_point(col="steelblue", pch=18, size=2)
A list of possible pch values
Let’s make a scatterplot on top of the blank ggplot by adding points using a geom layer called geom_point.
ggplot(mpg, aes(x=cty, y=hwy)) +
geom_point(col="steelblue", pch=18, size=2) +
labs(title="Scatterplot", subtitle="City MPG vs. Highway MPG", y="Highway MPG", x="City MPG", caption="source: mpg")
# + xlab("Area")
# + ylab("Population")
# + xlim(c(0,30)) %>%
# + ylim(c(0,40)) %>%
The warning is being given because we have adjusted the x and y axis to exclude some points.
gg <- ggplot(mpg, aes(x=cty, y=hwy)) +
geom_point(aes(col=class), pch=18, size=2) +
labs(title="Scatterplot", subtitle="City MPG vs. Highway MPG", y="Highway MPG", x="City MPG", caption="source: mpg")
gg
As an added benefit, the legend is added automatically. If needed, it can be removed by setting the legend.position to None from within a theme() function.
gg + theme(legend.position="None") # remove legend
Also, You can change the color palette entirely.
gg + scale_colour_brewer(palette = "Spectral") # change color palette
More of such palettes can be found in the RColorBrewer package
RColorBrewer palettes
You can also build your own color palettes using the built in colors in R or by using HEX codes (ie. #RRGGBB )
R Built In Colors
We will spend more time later in the course discussing best practices for color choices, but for now keep in mind:
ggplot(mpg, aes(x=cty, y=hwy, label=mpg$model)) +
geom_jitter(aes(col=class), pch=18, size=2) +
geom_text(size=1, hjust=0, vjust=0)
Themes can be a useful way to “style” an entire graph at once. Common themes are theme_classic(), theme_dark(), theme_bw(), and theme_grey().
gg + theme_bw()
library(ggthemes) contains lots of additional themes including theme_wsj() (Wall Street Journal), theme_economist() (The Economist), theme_fivethirtyeight() (Five Thirty Eight), etc.
library(ggthemes) #make sure you have run install.packages("ggthemes") on your computer at some point
gg + theme_wsj() + scale_color_wsj()
## Warning: This manual palette can handle a maximum of 6 values. You have
## supplied 7.
## Warning: Removed 62 rows containing missing values (geom_point).
Histograms should be used for one continuous variable.
ggplot(mpg, aes(cty)) +
scale_fill_brewer(palette = "Spectral") +
geom_histogram() + # change binwidth
labs(title="Histogram with Auto Binning",
subtitle="City MPG")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(mpg, aes(cty)) +
scale_fill_brewer(palette = "Spectral") +
geom_histogram(binwidth=2) + # change binwidth
labs(title="Histogram with Auto Binning",
subtitle="City MPG")
ggplot(mpg, aes(cty)) +
scale_fill_brewer(palette = "Spectral") +
geom_histogram(binwidth=2) + # change binwidth
labs(title="Histogram with Auto Binning",
subtitle="City MPG") +
coord_trans(x="log10")
Boxplots should be used for one continuous variable. Side-by-Side Boxplots can be good for comparing a numerical variable across many different levels (categories).
ggplot(mpg, aes(class, cty)) +
geom_boxplot( fill="plum", outlier.size=1) +
labs(title="Box plot",
subtitle="City Mileage grouped by Class of vehicle",
caption="Source: mpg",
x="Class of Vehicle",
y="City Mileage")
mpg %>%
mutate(class = reorder(class, cty, median )) %>%
ggplot(aes(class, cty)) +
geom_boxplot( fill="plum", outlier.size=1) +
labs(title="Box plot",
subtitle="City Mileage grouped by Class of vehicle",
caption="Source: mpg",
x="Class of Vehicle",
y="City Mileage")
Barplots should be used for one or two categorical variables.
ggplot(mpg, aes(manufacturer)) +
geom_bar() +
theme(axis.text.x = element_text(angle=90)) +
labs(title="Barplot on One Categorical Variable",
subtitle="Manufacturer across Vehicle Classes")
## OR
ggplot(mpg, aes(x = manufacturer)) +
geom_bar(fill="blue") +
#+ theme(axis.text.x = element_text(angle=90))
labs(title="Barplot on One Categorical Variable",
subtitle="Manufacturer across Vehicle Classes") +
coord_flip()
#+ scale_fill_brewer(palette = "Spectral")
ggplot(mpg, aes(manufacturer)) +
geom_bar(aes(fill=class)) +
labs(title="Barplot on Two Categorical Variables",
subtitle="Manufacturer across Vehicle Classes") +
theme_classic() +
theme(axis.text.x = element_text(angle=90)) +
scale_fill_brewer(palette = "Spectral")
The are so many different ways to modify the themes - the legend, where the axis ticks go, the background colors, the position of text, the font, etc. You can get a the full scope of all the options by typing ?theme into the console.
gapminder <- read.csv("https://ebmwhite.github.io/MATH0216/activities/gapminder.csv")
ggplot(gapminder, aes(x=year, y=lifeExp, group=country)) +
geom_line()
gapminder %>%
group_by(continent, year) %>%
summarise(lifeExp=median(lifeExp)) %>%
ggplot(aes(x=year, y=lifeExp, color=continent)) +
geom_line(size=1) +
geom_point(size=1.5)
gapminder %>%
filter(year==1952) %>%
ggplot( aes(gdpPercap, lifeExp, color = continent)) +
geom_point(aes(size = pop)) +
scale_x_log10()
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
gg <- ggplot(gapminder, aes(gdpPercap, lifeExp, color = continent, frame = year)) +
geom_point(aes(size = pop)) +
#geom_smooth(se = FALSE, method = "lm") +
scale_x_log10()
ggplotly(gg) %>%
highlight("plotly_hover")
mpg NYCairbnb2019.csv gapminder
#library(openintro)
#cars
#library(tidyverse)
data(diamonds)
force(diamonds)
## # A tibble: 53,940 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
## # … with 53,930 more rows
#NYCairbnb2019.csv
surveys <- read.csv("https://ebmwhite.github.io/MATH0216/data/sample.csv")
gapminder <- read.csv("https://ebmwhite.github.io/MATH0216/activities/gapminder.csv")
UNCdata <- read.csv("http://ryanthornburg.com/wp-content/uploads/2015/05/UNC_Salares_NandO_2015-05-06.csv")
Here are some resources that may be useful quick reference guides for ggplot2: